This page is a index to the supplementary materials for the article “Tsi in Yiddish”. ##Data used for the research
For this project, I used different sources for different languages. For all the languages except Ukrainian, the initial sources were texts. Texts were consequently proceeded with a python script.
For Ukrainian language we used GRAC corpus: [Maria Shvedova, Ruprecht von Waldenfels, Sergiy Yarygin, Mikhail Kruk, Andriy Rysin, Michał Woźniak (2017-2018): GRAC: General Regionally Annotated Corpus of Ukrainian. Electronic resource: Kyiv, Oslo, Jena. Available at uacorpus.org].
From there, I extracted questions using the following CQL-query:
<s> []{1,15} [word =="?"]
Then, I wrote a python script to extract question particles, such as А, Чи, Невже from the questions. A sample output:
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
| AUTHOR | TITLE | QWORD | QUOTE | Birthreg | POETRY | GENERIC_QWD | NEGATIVE | RHET |
|---|---|---|---|---|---|---|---|---|
| Segalovich | Antosha-novels | QWORD | װעמען פארטײדיקט ער ? | NA | QWORD | NA | NA | |
| Borokhov Ber | Geklibene Shriften 1905-1914 | CZY | צי קען די עמיגראַציאָנס-אָרגאַניזאַציע זײן רײן-דעמאָקראַטיש ? | UA-CRK | NA | CZY | NA | NA |
| Segalovich | Antosha-novels | NOQWORD | —לערן זיך,——האָט אים אַ חבר געראַמען—געלט האָסטו, צײט —איצטער זיך לערנען? | NA | NOQWORD | NA | NA | |
| Sforim | Letzte shriften 1904-1917 | QWORD | אַ פאָהר אַהין אַ פאָהר צוריק, אַײן עס, אַ טרונק קאָסט דאָך געלד, און װוּ זענען װײבּער ? | NA | QWORD | NA | NA | |
| Markish | Dor oys dor eyn 1929 FIC | QWORD | בּא יעדערן זיך אָפּגעשטעלט, ארײנגעקוקט אינעװײניק אין שטױבּיק שרײענדיקע אינגעװײדן פון אָט דעם נײעם מין גאָלעס, געטײלט <quote>גוט י מאָרגנס<quote> די אָרעמע װאַנדערער און זיך פאַנאַנדערגעפרעגט : פונװאַנען, אידן ? | NA | QWORD | NA | NA | |
| A. Reisen | Dertseylungen | QWORD | אום האַרבםט, װען דער װינט פלעגט בלאָזן, פלעגן דעמאָלט די צװײגן זיך שאָקלען און רױשן טרויעריק אין פענצטער אַריין, װי זײ װאָלטן עפעם אָן אומעטיקע לאָנגאָנענדיקע מײםע װעלן כאָ? | BY-MSQ | NA | QWORD | NA | NA |
| Varshavski Oyzer | Shmuglars 1920 | QWORD | —גײ שױן, גרונם, װאָס איז עס פאַר אַ לשון? | NA | QWORD | NA | NA | |
| Markish | Dor oys dor eyn 1929 FIC | QWORD | גזאָמער װאָס ? | NA | QWORD | NA | NA | |
| Vergelis | Oyg oyf Oyg collection POE | NOQWORD | זאָט דעו געקאָנט עש דשאַליו פאַרתאַלשו ן זאָט דעו געקאָנט עע זשאַליו צעשפּאַלטו ? | YPOETRY | NOQWORD | NA | NA | |
| Markish | Der Trot fun Doyres 1947 FIC | QWORD | ס’איז אים עפּעס געװאָרן טײַערער; ער האָט פונדאָסנײַ באטראכט דעם באלקן, די ערטערװײַז שימלענדיקע װענט און צװישן אײן װאנט און דער צװײטער ארײַנגעגאנװעט א קוק אפן קװארטיראנט אלײן, װאָס האָט אים איצט אױסגעזען א סאך גרעסער, װי ער איז, און פארמאכנדיק דעם בוך מיט אן איבערגעלאָזטן פינגער אינװײניק, האָט ער א זאָג געטאָן שטראלנדיק: — הײסט עס, אף איבערקערן די װעלט מאכט ניט אױס, אז מע איז א ייִִד און דע-גלײַכן? | NA | QWORD | YNEG | NA | |
| Markish | Dor oys dor eyn 1929 FIC | NOQWORD | ס’װאָלט דיר ניט געפאסט זײן קײן זאַװאָדטשיק, מײנסטו, האַ ? | NA | NOQWORD | YNEG | NA | |
| Segal Kalman | Getraye Libe 1960 FIC | QWORD | האָסט געזען אַמאָל אַ קלײן קינד, װאָס איז נרשט געבױרן גערואָרן? | NA | QWORD | NA | NA | |
| A. Reisen | Dertseylungen | QWORD | — מיינער, װאָם הײםט מי״יגער? | BY-MSQ | NA | QWORD | NA | NA |
| Sutskever | Valdiks 1937-1939 POE | CZY | צי איז דײַן פּנים אײנערלײ מיט מײַנעם? | YPOETRY | CZY | NA | NA | |
| Spektor Mordechai | Elnte un Farshtosene | CZY-dep | װער װײסט צי זי װעט מיך שױן גאָר אין שטובּ אַרײנלאָזן און װוּ אַהין-זשע האָבּ איך דען צו גײן, אַז נישט צו „דער פלײשיקער ? | NA | CZY-dep | YNEG | YRHET | |
| A. Reisen | Naye Verk | NOQWORD | בּרוקירען ? | NA | NOQWORD | NA | NA | |
| Sholem Ash | Motiven | QWORD | — נו , װײַס ניט נאָט װאָס ער טוט ? | NA | QWORD | YNEG | NA | |
| Spektor Mordechai | Shmad un Fertsvayflung | QWORD | — פאַניע סטאַרשינע,—האָט דבורה זיך אָנגערופן צו אים מיט אַירחמנות רחמנות פנים, װי גלײך דאָס גרעסטע אומגליק האָט זי געטראָפן, - אפשר װאָלסטו אים אָפּגעלאָזט פרײ ? | NA | QWORD | NA | NA | |
| Varshavski Oyzer | Shmuglars 1920 | QWORD | זעען יענע ייִדן, װי די חברה װאַקסט, נעמען זײ זי * דיקן אױף הינטער און ס’געפינען זיך שױן אַ פּאָר בּעליבּתישע מענטשלאַך, װאָס טענהן : —צי װאָס האָט דאָס געטױגט ? | NA | QWORD | NA | NA | |
| Sholem | Blondzhe Stern | QWORD | <quote>זיץ, מאַכט צו אים דער פּאָרעץ, זיץ, װאָס שטײסטו ? | NA | QWORD | NA | NA | |
| Dinezon | Zikrones nybc 207361 | QWORD | מאַכט זיך אַמאָל, עס עפנט בּײ אים אַ שטיװל דאָס מױל, אַ גאָט רײסט זיך, אָדער זײנע אַרבּל האָבּן זיך אױסגעריבּן און דער אונטערשלאַק הײבּט זיך אָן ארױסצוּװײזן, זאָגט בּרוך : — צו װאָס האָט גאָט, בּרוך הוא, בּאַשאַפן אַ נאָדל-פאָדם מיט לאַטקעלעך ? | NA | QWORD | NA | NA | |
| Ettinger | Serkele | QWORD | װאָס פֿעלט אײַך, װאָס? | NA | QWORD | NA | NA | |
| Dinezon | Alter | QWORD | — װאָס-זשע טוט מען פאָרט, ער זאָל יענע פאַרגעסן? | NA | QWORD | NA | NA | |
| Linetski | Dos poylishe yungel 1867 | NOQWORD | האַ ? | NA | NOQWORD | NA | NA | |
| Markish | Milkhome Stalingrad POE | QWORD | איז װעלכער װינט האָט זיך ניט אָפּגעשטעלט אַליע ? | YPOETRY | QWORD | YNEG | NA | |
| Spektor Mordechai | Yiddishe Tekhter | QWORD | װאָס טוט מען גישבּ פון אַ קינדס װעגן ? | NA | QWORD | NA | NA | |
| Dinezon | Alter | QWORD | זעט ער זי, קוקט זי אַלעמאָל אױף אים, נאָר װאָס איז דאָפּ זײן דאגה ? | NA | QWORD | NA | NA | |
| Perets | Geklibene verk nybc 209359 | QWORD | — װאָס װעל איך דאָרט טאָן ? | NA | QWORD | NA | NA | |
| Perets | Geklibene verk nybc 209359 | NOQWORD | פּערע ? | NA | NOQWORD | NA | NA | |
| Eliezer Steinbarg | Mayselekh kindertales | QWORD | האַלט דער נאָװי זיך מיטן פעדערל אין מױל און װײסט ניט, נעבעך, װאָס מיט אים צו טון : אַװעקװאַרפן ? | NA | QWORD | YNEG | NA | |
| Linetski | Der Pritshepe 1876 | QWORD | — נאָר דיא קשיא, װאָס נאָך קען מען אױפטיהן פֿאַר גײעס אין אחשורוש-שפיעל, װאָס אַללע פורים שפיעלער האָבען שון בין אַהער ניט אױפֿגעטיהן ? | NA | QWORD | YNEG | NA | |
| Kaczerginski | Grine legende stories 1943 | QWORD | װי האָבן זײ זיך צוזאמענגערעדט? | NA | QWORD | NA | NA | |
| Vergelis | Notizen vegn a seyder SCI | CZY | צי איז ער גיט ניכשל געװאָרן פון די קאטױלישע פּאָסטולאטן? | NA | CZY | NA | NA | |
| Ester Kreitman | Briliantn 1944 FIC | NOQWORD | װײניק שידוכים, אסתר קרייטמאַן מײנט איר, האָב איך אים שױן אַלײן גערעדט ? | NA | NOQWORD | NA | NA | |
| Sholem Aleykhem lat | Tevye | NOQWORD | — azoy? | NA | NOQWORD | NA | NA | |
| Vergelis | Di tsayt FIC 1981 | QWORD | — װען פאָרט איר אין ביראָבידזשאנער ראיאָן און אף װאָס פאר א שטעלע? | NA | QWORD | NA | NA | |
| Sutskever | Yidishe Gas 1941-1947 POE | NOQWORD | און װעדליק צײט ? | YPOETRY | NOQWORD | NA | NA | |
| Eliezer Steinbarg | Mesholim POE | NOQWORD | נײן ? | YPOETRY | NOQWORD | NA | NA | |
| Sforim | di takse 1869 | QWORD | װאס איז איהר האט אױך אַ כבוד ? | NA | QWORD | NA | NA | |
| Linetski | Dos poylishe yungel 1867 | QWORD | אײ פֿון װאס ? | NA | QWORD | NA | NA | |
| Spektor Mordechai | Soydes | QWORD | גאָר װאָס װעל איך אײך גײן דערצײלן? | NA | QWORD | NA | NA | |
| Fefer | Rotarmeyish 1944 POE | QWORD | װער װײסט ? | YPOETRY | QWORD | NA | NA | |
| Segalovich | Antosha-novels | QWORD | װוּ איז דאָס אַהינגעקומען ? | NA | QWORD | NA | NA | |
| Vergelis | Reyzes | NOQWORD | — אראָפּנעמען דעם כײרעם, אין װעלכן מע האָט שפּינאָזען ארײַנגעלײגט מיט דרײַ הונדערט יאָר צוריק? | NA | NOQWORD | NA | NA | |
| Markish | Der Trot fun Doyres 1947 FIC | NOQWORD | פארשטײסט? | NA | NOQWORD | NA | NA | |
| Perets | Geklibene verk nybc 209359 | CZY | צי ליגט נישט אין אים דער שמערץ פאַר אַלע, דאָס לײדן פאַרן גאַנצן דאָר ? | NA | CZY | YNEG | NA | |
| Sforim | Letzte shriften 1904-1917 | NOQWORD | אבער קידוש מאַכט רב יודעל ? | NA | NOQWORD | NA | NA | |
| Horonczyk | In geroysh fun mashinen | NOQWORD | ער האָט לאנג דך געשלאָגן מיט דער דעה, צי ער זאָל פרעגן, און ערשע נאָך אַ לאַנגן איבּערלײגן זיך, האָט ער אַ פרעג געטאָן: — און די פאַרטײ אַרבּעט — פירט אָן סעמערל ? | NA | NOQWORD | NA | NA | |
| A. Reisen | Dertseylungen | NOQWORD | — אַן אומגליק? | BY-MSQ | NA | NOQWORD | NA | NA |
| Dinezon | Tsvey mames | QWORD | — פאַרװאָס ? | NA | QWORD | NA | NA |
First, for parts focusing on Yiddish language we defined czyperc as the percent of CZY questions (excluding CZY-dep, and CZY-or subtypes) as these subtypes were manually labeled in the Yiddish part of the database. The formula for it is:
\[\frac{CZY}{\sum{NOQWORD + CZYdep + CZYor + CZY}}\]
Second way to define czyperc was used for Slavic languages and Yiddish in comparison with them. As the types of “tsi”-kind particle were not labeled in this part of database, all types of CZY were counted together. The formula for this czyperc is:
\[\frac{CZY+CZYdep+CZYor}{\sum{NOQWORD + CZYdep + CZYor + CZY}}\]
The “raw” dataset was used to produce summarizing table used for further analysis. The general formula for that was:
In the example above a condition is used to exclude poetical works from the table(POETRY!=“YPOETRY”). Poetry texts were proceeded separately from the fiction texts of the same authors and marked as a distinct author (for example, Sutskever and Sutskever POE)
The same actions were applied to “raw” dataset for other languages. For Slavic languages and Yiddish in comparison with them the resulting table is presented below:
For Yiddish-centered part of research the table was more complete, icluding different czyperc scores for CZY-type question czypercDO - the same metrics calculated for CZY-dep and CZY-or questions:
For analysis of rhetoric quesions a cut version of the table above was used - poetry texts were excluded.
Click here to see
Distribution of tsi as Yes/No question particle in dialects of Yiddish amongwriters. Region of birth11 and date of birth given.
figure1 <- ggplot(writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"], aes(x=Dialect, y=czyperc, color =Dialect)) + geom_boxplot() + geom_text(aes(label=paste(rn, Birthreg,Birthdate, sep = "-")))
ggplotly(figure1)
Diachronic development of tsi as a question particle usage.
figure2 <- ggplot(writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber"], aes(x=Birthdate, y=czyperc,color = Dialect)) + geom_point() + geom_text(aes(label=rn)) + stat_smooth(method = 'lm',se = FALSE) + ylim(0,0.25) + stat_smooth(aes(x = writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber",Birthdate], y = writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber",czyperc], color ="general"), method = "lm",se = FALSE,size=1,fill="black",colour="black")
ggplotly(figure2)
Diachronic development of tsi as disjunctive a complement clause marker.
figure3 <- ggplot(writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber"], aes(x=Birthdate, y=czypercDO,color = Dialect)) + geom_point() + geom_text(aes(label=rn)) + stat_smooth(method = 'lm',se = FALSE) + ylim(0,0.25) + stat_smooth(aes(x = writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber",Birthdate], y = writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber",czypercDO], color ="general"), method = "lm",se = FALSE,size=1,fill="black",colour="black")
ggplotly(figure3)
Diachronic development of different tsi uses: Blue line – czypercDO for a given birthdate, red line – czyperc for a birthdate
figure4 <- ggplot(writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber"], aes(x=Birthdate, y=czypercDO)) + geom_point() + geom_text(aes(label=rn)) + stat_smooth(method = 'loess',se = FALSE) + ylim(0,0.25) + stat_smooth(aes(x = writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber",Birthdate], y = writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber",czyperc], color = "red"), method = "loess", se = FALSE) + scale_shape_discrete(name="czypercDO vs CZYPERC", breaks = c("colour"))
ggplotly(figure4)
Figure 5: Tsi as a Yes/No question particle (czyperc) vs. tsi in other functions (czypercDO).
figure5 <- ggplot(writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber"], aes(x=czypercDO, y=czyperc)) + geom_point() + geom_text(aes(label=paste(rn,Birthdate,sep ="-")))+ stat_smooth(method = 'lm') + ylim(0,0.25)
ggplotly(figure5)
Figure 6: Tsi as Yes/No question particle in poetry (YPOETRY) and fiction (NA).
figure6 <- ggplot(writerstable_ypoe[rn%in%c("Markish","Markish POE", "Sutskever", "Sutskever POE","Eliezer Steinbarg","Eliezer Steinbarg POE", "Fininberg", "Fininberg POE", "J. Glatstein", "J. Glatstein POE", "Vergelis", "Vergelis POE", "Perets", "Perets POE", "Kulbak", "Kulbak POE")], aes(x=POETRY, y=czyperc)) + geom_boxplot() + geom_text(aes(label=paste(rn, Birthreg,Birthdate,paste("ALLSUM:",ALLSUM,sep = ""), sep = "-")))
ggplotly(figure6)
Figure 7: Diachronic development of Yes/No question tsi with rhetoric semanthics. Blue curve –percent of rhetorical questions with tsi (rhetperc); red curve – percent of rhetorical questions with tsi in other functions or without a particle (rhetpercNOQWORD); black curve – general czyperc rate.
figure7 <- ggplot(writerstable_with_rhet[rhetsum>8],aes(x=Birthdate, y=rhetperc,colour=Dialect,group = 1)) + geom_point() + geom_text(aes(label=rn)) + stat_smooth(method='loess', se = F) + stat_smooth(aes(x = writerstable_with_rhet[rhetsum>8,Birthdate], y = writerstable_with_rhet[rhetsum>8,rhetpercNOQWORD]), method = "loess",se = FALSE,size=1,fill="black",colour="red") + stat_smooth(aes(x = writerstable_with_rhet[rhetsum>8,Birthdate], y = writerstable_with_rhet[rhetsum>8,czyperc]), method = "loess",se = FALSE,size=2,colour="black")
ggplotly(figure7)
“tsi”-kind particles in all functions in selected languages.
figure8 <- ggplot(writerstable_belyidpolukr[Birthdate>1800&POETRY=="NA"], aes(x=LANG, y=czyperc,color = LANG)) + geom_boxplot() + geom_jitter(aes(label=rn), alpha = 0.8)
## Warning: Ignoring unknown aesthetics: label
ggplotly(figure8)
Diachronic development of “tsi”-kind particles in all functions in selected languages.
figure9 <- ggplot(writerstable_belyidpolukr[Birthdate!="NA"&POETRY=="NA"&rn!="Bruno Schulz"&rn!="Borokhov Ber"&LANG%in%c("YID","BEL","PL","UKR")], aes(x=as.numeric(as.character(Birthdate)), y=as.numeric(as.character(czyperc)),color = LANG)) + geom_text(data=writerstable_belyidpolukr[Birthdate!="NA"&POETRY=="NA"&rn!="Bruno Schulz"&rn!="Borokhov Ber"&LANG%in%c("YID","PL","BEL")],aes(label=rn),check_overlap = TRUE) + stat_smooth(method = 'loess',se=F) + ylim(0,0.5) + labs(x = "Birthdate", y = "czyperc")
ggplotly(figure9)
Comments on the data files used: raw dataset (with the quetsion text string)
Below is the information about data files that were used to perform this research: both original data file produced by a python script (extract shown above) and the summary tables used for further plots (find interactive plots below).
Author: The writer of a given text.
Title: The title of a given text.
Genre: The genre of a text. Fiction texts represent the main share of our database, while poetry is excluded from it completely.
Quote The question sentence extracted from text or, for Ukrainian, GRAC corpus
Qword,GENERIC_QWD Type of the question string, as determinded by the python script and then, in case of Yiddish material, manually alternated. For Yiddish, this variable is used as a “raw” one, not used for quantitative analysis, but rather as a index for the whole variability of the questions. To simplify the quantitative analysis, all the options available for QWORD variable were merged into GENERIC_QWD variable.
Find more information about the GENERIC_QWD variable types:
1.QWORD - Any type of Wh-Word (‘how’, ‘where’, ‘why’ and so on) is found in the question string. When working with Yiddish data, if another question particle (tsi, czy a so on) is present there, the question is assigned to CZYQWORD group and then manually put to CZY or QWORD groups. For other languages this procedure was not performed and, as for now, CZYQWORD group is merged with QWORD.
2.NOQWORD - Question without any question particle (counted in our analysis).
3.CZY - One instance of Чи particle found in the question string.
4.CZY-dep - Tsi is used as a complement clause marker .<-> 5.CZY-or - Tsi is used as a disjunctive connector.
RHET The question is likely to contain rhetoric semanthics, that is, strings *" דען “, ” דענ “, ” טאַקע “* ," טאקע “,” טאַקי“, טאַ”ק טאַקיי“”(representing rhetorical particles take and den in varous spelling variants) are found in the text.NA - not attested, YRHET - rhetorical particles found.